AITopics | student node

Collaborating Authors

student node

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

Sebastian Goldt, Madhu Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsFeb-14-2026, 03:04:51 GMT

Neural Information Processing Systems http://nips.cc/

generalisation error, neural network, student, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report (0.68)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

Sebastian Goldt, Madhu Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsAug-20-2025, 02:11:13 GMT

Deep neural networks behind state-of-the-art results in image classification and other domains have one thing in common: their size.

generalisation error, neural network, student, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report (0.68)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Understanding Robustness in Teacher-Student Setting: A New Perspective

Yang, Zhuolin, Chen, Zhaoxi, Cai, Tiffany, Chen, Xinyun, Li, Bo, Tian, Yuandong

arXiv.org Artificial IntelligenceFeb-28-2021

Adversarial examples have appeared as a ubiquitous property of machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions. Such examples provide a way to assess the robustness of machine learning models as well as a proxy for understanding the model training process. Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness (e.g. adversarial training). While they mostly focus on models trained on datasets with predefined labels, we leverage the teacher-student framework and assume a teacher model, or oracle, to provide the labels for given instances. We extend Tian (2019) in the case of low-rank input data and show that student specialization (trained student neuron is highly correlated with certain teacher neuron at the same layer) still happens within the input subspace, but the teacher and student nodes could differ wildly out of the data subspace, which we conjecture leads to adversarial examples. Extensive experiments show that student specialization correlates strongly with model robustness in different scenarios, including student trained via standard training, adversarial training, confidence-calibrated adversarial training, and training with robust feature dataset. Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.

node, specialization, student node, (13 more...)

arXiv.org Artificial Intelligence

2102.1317

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Understanding Diversity based Pruning of Neural Networks -- Statistical Mechanical Analysis

Acharyya, Rupam, Zhang, Boyu, Chattoraj, Ankani, Das, Shouman, Stefankovic, Daniel

arXiv.org Machine LearningJun-30-2020

Deep learning architectures with a huge number of parameters are often compressed using pruning techniques to ensure computational efficiency of inference during deployment. Despite multitude of empirical advances, there is no theoretical understanding of the effectiveness of different pruning methods. We address this issue by setting up the problem in the statistical mechanics formulation of a teacher-student framework and deriving generalization error (GE) bounds of specific pruning methods. This theoretical premise allows comparison between pruning methods and we use it to investigate compression of neural networks via diversity-based pruning methods. A recent work showed that Determinantal Point Process (DPP) based node pruning method is notably superior to competing approaches when tested on real datasets. Using GE bounds in the aforementioned setup we provide theoretical guarantees for their empirical observations. Another consistent finding in literature is that sparse neural networks (edge pruned) generalize better than dense neural networks (node pruned) for a fixed number of parameters. We use our theoretical setup to prove that baseline random edge pruning method performs better than DPP node pruning method. Finally, we draw motivation from our theoretical results to propose a DPP edge pruning technique for neural networks which empirically outperforms other competing pruning methods on real datasets.

artificial intelligence, latexit sha1, machine learning, (20 more...)

arXiv.org Machine Learning

2006.16617

Country: North America > United States > New York > Monroe County > Rochester (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Over-parameterization as a Catalyst for Better Generalization of Deep ReLU network

Tian, Yuandong

arXiv.org Machine LearningOct-17-2019

A BSTRACT To analyze deep ReLU network, we adopt a student-teacher setting in which an over-parameterized student network learns from the output of a fixed teacher network of the same depth, with Stochastic Gradient Descent (SGD). First, we prove that when the gradient is zero (or bounded above by a small constant) at every data point in training, a situation called interpolation setting, there exists many-to-one alignment between student and teacher nodes in the lowest layer under mild conditions. This suggests that generalization in unseen dataset is achievable, even the same condition often leads to zero training error. Second, analysis of noisy recovery and training dynamics in 2-layer network shows that strong teacher nodes (with large fan-out weights) are learned first and subtle teacher nodes are left unlearned until late stage of training. As a result, it could take a long time to converge into these small-gradient critical points. Our analysis shows that over-parameterization plays two roles: (1) it is a necessary condition for alignment to happen at the critical points, and (2) in training dynamics, it helps student nodes cover more teacher nodes with fewer iterations. Although networks with even one-hidden layer can fit any function (Hornik et al., 1989), it remains an open question how such networks can generalize to new data. Different from what traditional machine learning theory predicts, empirical evidence (Zhang et al., 2017) shows more parameters in neural network lead to better generalization. How over-parameterization yields strong generalization is an important question for understanding how deep learning works. In this paper, we analyze multi-layer ReLU networks by adopting teacher-student setting. The fixed teacher network provides the output for the student to learn via SGD. The student is over-parameterized (or over-realized): it has more nodes than the teacher. Therefore, there exists student weights whose gradient at every data point is zero. Here, we want to study the inverse problem: With small gradient at every training sample, can the student weights recover the teachers'? If so, then the generalization performance can be guaranteed if the training converges to such critical points. In this paper, we show that this so-called interpolation setting (Ma et al., 2017; Liu & Belkin, 2018; Bassily et al., 2018) leads to alignment: under certain conditions, each teacher node is provably aligned with at least one student node in the lowest layer. The condition is simply that the teacher node is observed by at least one student node, i.e., teacher's ReLU boundary lies in the activation region of that student. Therefore, more over-parameterization increases the probability of teachers being observed and thus being aligned. Furthermore, in 2-layer case, those student nodes that are not aligned with any teacher have zero contribution to the output and can be pruned.

artificial intelligence, machine learning, teacher node, (18 more...)

arXiv.org Machine Learning

1909.13458

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.50)

Industry:

Education (0.66)
Materials > Chemicals > Specialty Chemicals (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Luck Matters: Understanding Training Dynamics of Deep ReLU Networks

Tian, Yuandong, Jiang, Tina, Gong, Qucheng, Morcos, Ari

arXiv.org Machine LearningJun-10-2019

We analyze the dynamics of training deep ReLU networks and their implications on generalization capability. Using a teacher-student setting, we discovered a novel relationship between the gradient received by hidden student nodes and the activations of teacher nodes for deep ReLU networks. With this relationship and the assumption of small overlapping teacher node activations, we prove that (1) student nodes whose weights are initialized to be close to teacher nodes converge to them at a faster rate, and (2) in over-parameterized regimes and 2-layer case, while a small set of lucky nodes do converge to the teacher nodes, the fanout weights of other nodes converge to zero. This framework provides insight into multiple puzzling phenomena in deep learning like over-parameterization, implicit regularization, lottery tickets, etc. We verify our assumption by showing that the majority of BatchNorm biases of pre-trained VGG11/13/16/19 models are negative.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

1905.13405

Genre:

Research Report (0.40)
Contests & Prizes (0.34)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Adaptive Back-Propagation in On-Line Learning of Multilayer Networks

West, Ansgar H. L., Saad, David

Neural Information Processing SystemsDec-31-1996

This research has been motivated by the dominance of the suboptimal symmetric phase in online learning of two-layer feedforward networks trained by gradient descent [2]. This trapping is emphasized for inappropriate small learning rates but exists in all training scenarios, effecting the learning process considerably. We Adaptive Back-Propagation in Online Learning of Multilayer Networks 329 proposed an adaptive back-propagation training algorithm [Eq.

algorithm, equation, gradient descent, (11 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.05)

Genre: Instructional Material > Online (0.40)

Industry: